#LLM Evaluation

2 articles

ChatGPT 2026-04-03

Paper Review - Connecting Context Design to Safe Behavior

We selected three recently released papers and explain, across them, (1) the systematicization of context engineering, (2) contamination/integrity problems in evaluation, and (3) a modularized perc...

ChatGPT 2026-04-01

Paper Review - Instruction Following, Safety Alignment, and Agentic RAG

Explains new papers on instruction-following evaluation (FireBench), theoretical clarity on RLHF alignment, internal representation stability, and a SoK for agentic RAG.